Processing “Computed” Texts
نویسنده
چکیده
This article is a comparison of methods that may be used to derive texts to be typeset by a word processor. By ‘derive’, we mean that such texts are extracted from a larger structure, which can be viewed as a database. The present standard for such a structure uses an XML-like format, and we give an overview of the available tools for this derivation task.
منابع مشابه
Statistical Natural Language Processing Method for Variant Texts Segmentation
It is well known that some techniques have already been developed to automatically subdivide texts into multiparagraph subtopic passages, such as TextTiling methodology proposed by Hearst. However, an additional algorithm is needed to perform a similar task for parallel or variant texts, because ambiguous and complicated traces of cross citation among them might often generate some sinuous patt...
متن کاملComputing semantic relatedness of words and texts in Wikipedia-derived semantic space
Adequate representation of natural language semantics requires access to vast amounts of common sense and domain-specific world knowledge. Prior work in the field was either based on purely statistical techniques that did not make use of background knowledge or on huge manual efforts, such as the CYC projects. Here we propose a novel method, called Explicit Semantic Analysis (ESA), for finegrai...
متن کاملAutomatic Text Decomposition and Structuring
Sophisticated text similarity measurements are used to determine relationships between natural-language texts and text segments. The resulting linked hypertext maps are used to identify different text types and text structures, leading to improved text access and utilization. Examples of text decomposition are given for expository and non-expository texts. The vector processing model of retriev...
متن کاملMemory-based language processing: psycholinguistic research in the 1990s.
There are two main domains of research in psycholinguistics: sentence processing, concerned with how the syntactic structures of sentences are computed, and text processing, concerned with how the meanings of larger units of text are understood. In recent sentence processing research, a new and controversial theme is that syntactic computations may rely heavily on statistical information about ...
متن کاملAccelerating Boyer Moore Searches on Binary Texts
The Boyer and Moore (BM) pattern matching algorithm is considered as one of the best, but its performance is reduced on binary data. Yet, searching in binary texts has important applications, such as compressed matching. The paper shows how, by means of some pre-computed tables, one may implement the BM algorithm also for the binary case without referring to bits, and processing only entire blo...
متن کامل